Search CORE

155 research outputs found

Instantaneous PSD Estimation for Speech Enhancement based on Generalized Principal Components

Author: Dietzen Thomas
Moonen Marc
van Waterschoot Toon
Publication venue
Publication date: 01/07/2020
Field of study

Power spectral density (PSD) estimates of various microphone signal components are essential to many speech enhancement procedures. As speech is highly non-nonstationary, performance improvements may be gained by maintaining time-variations in PSD estimates. In this paper, we propose an instantaneous PSD estimation approach based on generalized principal components. Similarly to other eigenspace-based PSD estimation approaches, we rely on recursive averaging in order to obtain a microphone signal correlation matrix estimate to be decomposed. However, instead of estimating the PSDs directly from the temporally smooth generalized eigenvalues of this matrix, yielding temporally smooth PSD estimates, we propose to estimate the PSDs from newly defined instantaneous generalized eigenvalues, yielding instantaneous PSD estimates. The instantaneous generalized eigenvalues are defined from the generalized principal components, i.e. a generalized eigenvector-based transform of the microphone signals. We further show that the smooth generalized eigenvalues can be understood as a recursive average of the instantaneous generalized eigenvalues. Simulation results comparing the multi-channel Wiener filter (MWF) with smooth and instantaneous PSD estimates indicate better speech enhancement performance for the latter. A MATLAB implementation is available online

arXiv.org e-Print Archive

On the Convergence of the Multipole Expansion Method

Author: De Sena Enzo
Fitzpatrick Brian
van Waterschoot Toon
Publication venue
Publication date: 03/06/2021
Field of study

The multipole expansion method (MEM) is a spatial discretization technique that is widely used in applications that feature scattering of waves from circular cylinders. Moreover, it also serves as a key component in several other numerical methods in which scattering computations involving arbitrarily shaped objects are accelerated by enclosing the objects in artificial cylinders. A fundamental question is that of how fast the approximation error of the MEM converges to zero as the truncation number goes to infinity. Despite the fact that the MEM was introduced in 1913, and has been in widespread usage as a numerical technique since as far back as 1955, to the best of the authors' knowledge, a precise characterization of the asymptotic rate of convergence of the MEM has not been obtained. In this work, we provide a resolution to this issue. While our focus in this paper is on the Dirichlet scattering problem, this is merely for convenience and our results actually establish convergence rates that hold for all MEM formulations irrespective of the specific boundary conditions or boundary integral equation solution representation chosen.Comment: 21 pages, 2 figures; Corrected a scaling error that occurred when plotting the third columns of Figs 1,2,3, some very minor grammatical edits to the intro/conclusion to improve clarity and conciseness, included funding info in first page; updated intro with historical info; reformatted several sections to reduce no. of pages; changed title, shortened abstract; fixed typo in proof of Thm 1.

arXiv.org e-Print Archive

University of Surrey

Low-Complexity Steered Response Power Mapping based on Nyquist-Shannon Sampling

Author: De Sena Enzo
Dietzen Thomas
van Waterschoot Toon
Publication venue
Publication date: 22/07/2021
Field of study

The steered response power (SRP) approach to acoustic source localization computes a map of the acoustic scene from the frequency-weighted output power of a beamformer steered towards a set of candidate locations. Equivalently, SRP may be expressed in terms of time-domain generalized cross-correlations (GCCs) at lags equal to the candidate locations' time-differences of arrival (TDOAs). Due to the dense grid of candidate locations, each of which requires inverse Fourier transform (IFT) evaluations, conventional SRP exhibits a high computational complexity. In this paper, we propose a low-complexity SRP approach based on Nyquist-Shannon sampling. Noting that on the one hand the range of possible TDOAs is physically bounded, while on the other hand the GCCs are bandlimited, we critically sample the GCCs around their TDOA interval and approximate the SRP map by interpolation. In usual setups, the number of sample points can be orders of magnitude less than the number of candidate locations and frequency bins, yielding a significant reduction of IFT computations at a limited interpolation cost. Simulations comparing the proposed approximation with conventional SRP indicate low approximation errors and equal localization performance. MATLAB and Python implementations are available online

arXiv.org e-Print Archive

University of Surrey

Detection and restoration of click degraded audio based on high-order sparse linear prediction

Author: Adugna Eneyew
Derebssa Bisrat
Eneman Koen
Waterschoot Toon van
Publication venue: Addis Ababa University printing
Publication date: 21/08/2022
Field of study

Clicks are short-duration defects that affect most archived audio media. Linear prediction (LP) modeling for the representation and restoration of audio signals that have been corrupted by click degradation has been extensively studied. The use of high-order sparse linear prediction for the restoration of clickdegraded audio given the time location of samples affected by click degradation has been shown to lead to significant restoration improvement over conventional LP-based approaches. For the practical usage of such methods, the identification of the time location of samples affected by click degradation is critical. High-order sparse linear prediction has been shown to lead to better modeling of audio resulting in better restoration of click degraded archived audio. In this paper, the use of high-order sparse linear prediction for the detection and restoration of click degraded audio is proposed. Results in terms of click duration estimation, SNR improvement and perceptual audio quality show that the proposed approach based on high-order sparse linear prediction leads to better performance compared to state of the art LP-based approaches.&nbsp

AJOL - African Journals Online

Sampling Rate Offset Estimation and Compensation for Distributed Adaptive Node-Specific Signal Estimation in Wireless Acoustic Sensor Networks

Author: Didier Paul
Doclo Simon
Moonen Marc
van Waterschoot Toon
Publication venue
Publication date: 04/11/2022
Field of study

Sampling rate offsets (SROs) between devices in a heterogeneous wireless acoustic sensor network (WASN) can hinder the ability of distributed adaptive algorithms to perform as intended when they rely on coherent signal processing. In this paper, we present an SRO estimation and compensation method to allow the deployment of the distributed adaptive node-specific signal estimation (DANSE) algorithm in WASNs composed of asynchronous devices. The signals available at each node are first utilised in a coherence-drift-based method to blindly estimate SROs which are then compensated for via phase shifts in the frequency domain. A modification of the weighted overlap-add (WOLA) implementation of DANSE is introduced to account for SRO-induced full-sample drifts, permitting per-sample signal transmission via an approximation of the WOLA process as a time-domain convolution. The performance of the proposed algorithm is evaluated in the context of distributed noise reduction for the estimation of a target speech signal in an asynchronous WASN.Comment: 9 pages, 6 figure

arXiv.org e-Print Archive